Hard problems in similarity searching
نویسندگان
چکیده
The Closest Substring Problem is one of the most important problems in the field of computational biology. It is stated as follows: given a set of t sequences s1; s2; : : : st over an alphabet , and two integers k; d with d k, can one find a string s of length k and, for all i = 1; 2; : : : ; t, substrings oi of si, all of length k, such that d(s; oi) d (for all i = 1; 2; : : : ; t)? (here, d(:; :) represents the Hamming distance). Closest Substring was shown to be NP-hard [9] and W [1℄-hard with respect to the number t of input sequences [7]; recently, an important number of results concerning the parameterized computational complexity of Closest Substring has been added in [6]. In this paper we introduce and analyze two variants of the Closest Substring Problem, obtained by imposing restrictions on the pairwise distances between the substrings oi: the bounded Hamming distance constraint asks that d(oi; oj) p, for all i; j 2 f1; 2; : : : ; tg (where p < 2d is a given constant) and yields the problem called BCCS; the sum-of-pairs constraint asks thatP1 i<j t d(oi; oj) P (where P < dt(t 1) is a given constant) and yields the problem called SCCS. We motivate the introduction of these problems, and we show that while SCCS is very close to Closest Substring, BCCS is a non-trivial restriction of Closest Substring more suitable to use in certain practical applications. We then concentrate on BCCS and show that all the hardness results available for Closest Substring remain valid for BCCS even when the parameter p is restricted to a certain range.
منابع مشابه
An Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems
One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...
متن کاملAn Improvement in WRP Block Replacement Policy with Reviewing and Solving its Problems
One of the most important items for better file system performance is efficient buffering of disk blocks in main memory. Efficient buffering helps to reduce the widespeed gap between main memory and hard disks. In this buffering system, the block replacement policy is one of the most important design decisions that determines which disk block should be replaced when the buffer is full. To o...
متن کاملEntity Matching in Vector Spatial Data
Entity matching is a crucial and hard technology in application of vector spatial data integrating, data updating and map differential analyses. According to the disadvantage of matching algorithms nowadays, from the candidate searching algorithm of entity matching, similarity measure and matching strategy, this paper does a deep research in this three aspects. And proposes an area entity searc...
متن کاملRelational Databases Query Optimization using Hybrid Evolutionary Algorithm
Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...
متن کاملApproximate search strategies for weighted trees
The problems of (classical) searching and connected searching of weighted trees are known to be computationally hard. In this work we give a polynomial-time 3-approximation algorithm that finds a connected search strategy of a given weighted tree. This in particular yields constant factor approximation algorithms for the (non-connected) classical searching problems and for the weighted pathwidt...
متن کاملSolving approximate similarity queries
As we know, both nearest neighbor and range searching problems are among the most important and fundamental problems in computational geometry because of its numerous important application areas [1, 2]. Specially, in many modern database applications, high-dimensional searching problems arise when complex objects are represented by vectors of d numeric features. As the dimension d increases hig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Discrete Applied Mathematics
دوره 144 شماره
صفحات -
تاریخ انتشار 2004